Domain Adaptation of CNN Based Acoustic Models Under Limited Resource Settings

نویسندگان

Masayuki Suzuki

Ryuki Tachibana

Samuel Thomas

Bhuvana Ramabhadran

George Saon

چکیده

Adaptation of Automatic Speech Recognition (ASR) systems to a new domain (channel, speaker, topic, etc.) remains a significant challenge, as often, only a limited amount of target domain data for adaptation of Acoustic Models (AMs) is available. However, unlike GMMs, to date, there has not been an established, efficient method for adapting current state-of-theart Convolutional Neural Network (CNN)-based AMs. In this paper, we explore various training algorithms for domain adaptation of CNN based speech recognition systems with limited acoustic training data resources. Our investigations illustrate the following three main contributions. First, introducing a weight decay based regularizer along with the standard cross entropy criteria can significantly improve recognition performances with as little as one hour of adaptation data. Second, the observed gains can be improved further with the state-level Minimum Bayes Risk (sMBR) based sequence training technique. In addition to supervised training with limited amounts of data, we also study the effect of introducing unsupervised data at both the initial cross-entropy and subsequent sequence training stages. Our experiments show that unsupervised data helps with cross-entropy and sequence training criteria. Third, the effect of speaker diversity in the adaptation data is also investigated where our experiments show that although there can be large variance in final performance depending on the speakers selected, regularization is required to obtain significant gains. Overall, we demonstrate that with adaptation of neural network based acoustic models, we can obtain performance improvements of up to 24.8% relative.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Rapid and effective speaker adaptation of convolutional neural network based models for speech recognition

Recently, we have proposed a novel fast adaptation method for the hybrid DNN-HMM models in speech recognition [1]. This method relies on learning an adaptation NN that is capable of transforming input speech features for a certain speaker into a more speaker independent space given a suitable speaker code. Speaker codes are learned for each speaker during adaptation. The whole multi-speaker tra...

متن کامل

شبکه عصبی پیچشی با پنجره‌های قابل تطبیق برای بازشناسی گفتار

Although, speech recognition systems are widely used and their accuracies are continuously increased, there is a considerable performance gap between their accuracies and human recognition ability. This is partially due to high speaker variations in speech signal. Deep neural networks are among the best tools for acoustic modeling. Recently, using hybrid deep neural network and hidden Markov mo...

متن کامل

Context Adaptive Neural Network for Rapid Adaptation of Deep CNN Based Acoustic Models

Using auxiliary input features has been seen as one of the most effective ways to adapt deep neural network (DNN)-based acoustic models to speaker or environment. However, this approach has several limitations. It only performs compensation of the bias term of the hidden layer and therefore does not fully exploit the network capabilities. Moreover, it may not be well suited for certain types of...

متن کامل

Methods for task adaptation of acoustic models with limited transcribed in-domain data

Application specific acoustic models provide the best recognition accuracy, but they are expensive, because they require the transcription of tens or hundreds of hours of in-domain speech for training. Therefore, this paper focuses on the acoustic model estimation given limited in-domain transcribed speech data, and large amounts of (typically available) transcribed out-of-domain data. First, w...

متن کامل

Deep Domain Confusion: Maximizing for Domain Invariance

Recent reports suggest that a generic supervised deep CNN model trained on a large-scale dataset reduces, but does not remove, dataset bias on a standard benchmark. Fine-tuning deep models in a new domain can require a significant amount of data, which for many applications is simply not available. We propose a new CNN architecture which introduces an adaptation layer and an additional domain c...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2016

Domain Adaptation of CNN Based Acoustic Models Under Limited Resource Settings

نویسندگان

چکیده

منابع مشابه

Rapid and effective speaker adaptation of convolutional neural network based models for speech recognition

شبکه عصبی پیچشی با پنجره‌های قابل تطبیق برای بازشناسی گفتار

Context Adaptive Neural Network for Rapid Adaptation of Deep CNN Based Acoustic Models

Methods for task adaptation of acoustic models with limited transcribed in-domain data

Deep Domain Confusion: Maximizing for Domain Invariance

عنوان ژورنال:

اشتراک گذاری